† Corresponding author. E-mail:
Coarse-grained (CG) simulations can more efficiently study large conformational changes of biological polymers but usually lose accuracies in the details. Lots of different hybrid models involving multiple different resolutions have been developed to overcome the difficulty. Here we propose a novel effective hybrid CG (hyCG) approach which mixes the fine-grained interaction and its average in CG space to form a more smoothing potential energy surface. The hyCG approximately reproduces the potential of mean force in the CG space, and multiple mixed potentials can be further combined together to form a single effective force field for achieving both high efficiency and high accuracy. We illustrate the hyCG method in Trp-cage and Villin headpiece proteins to exhibit the folding of proteins. The topology of the folding landscape and thus the folding paths are preserved, while the folding is boosted nearly one order of magnitude faster. It indicates that the hyCG approach could be applied as an efficient force field in proteins.
For the past few decades, molecular dynamics (MD) simulation has become one of the most powerful tools for predicting the property of biopolymers like proteins. The development of a high precision empirical all-atom (AA) force field allows investigation of the large structural change in at least small proteins. With the help of the specially designed chip, the accessible timescale of MD can even extend to the millisecond nowadays,[1] which reaches or approaches to the experimental interesting time scales of proteins.
Although there has been success in the fine-grained models, large demand in the study of membrane proteins and genetic materials still requires a coarse-grained (CG) model for further extending both length and time scales of simulations. During the last 20 years, vast CG methods have been proposed for different requirements.[2–15] Although these approaches can currently reach relative accuracy and efficiency in understanding the overall property of polymers, liquids, and biological membrane, still, many unsolved problems remain in transferability and thermodynamic consistency.[16] They have limited ability in describing the elaborate structural change in proteins.[17, 18]
Recently, multiscale modelings or hybrid methods have caught a great deal of attention for their ability in preserving the details of protein structure while largely increasing the accessible time scale and length scale of simulations. These methods have been developed into 2 classes: the resolution mixing[19–22] and the resolution scaling.[23–25] The resolution mixing is an actual degree-reducing method, it commonly keeps the high resolution, e.g., the AA, a description for the detailed sensitive region like proteins while the surrounding environment is characterized by low resolution, e.g., the CG. On the contrary, the resolution scaling directly couples the strength of the AA and the CG interaction into a single potential, which is in actuality a fine-grained model but with a modified energy surface. The hybrid interaction in the resolution scaling can be generally written as
In the pure CG model (λ = 1), it is clear that the CG potential should equal to the potential of mean force (PMF) to reproduce the probability distribution of the AA in the CG space. Many methods, by matching such as the average force, thermodynamic observables, the PMF, or the probability density, are designed to construct the CG potential.[4, 14, 15] However, in the hybrid CG models (
In this paper, we intend to propose a novel but economical hybrid approach. This approach can be divided into two stages. The first stage is the same as the normal resolution scaling method but specifies the CG potential as the average of the AA potential. The hybrid potential is much more smoothed so that it is easier to reach equilibrium. It is interesting to note that, the exclusion of entropy in this CG potential is approximately replenished by the existence of the fine-grained interaction in the hybrid model. The second stage is to bridge multiple
The paper is organized as follows. We first show the hybrid protocol in subSection
In this section, we first construct a hybrid potential to approximately keep the PMF in the CG space, then combine multiple hybrid strengths together to automatically tune the fitness between the CG and AA models.
In our approach, first, the λ-hybrid potential
The free energy landscape of the hybrid system in the CG space, i.e., the PMF, is
To ensure the proper rigidity of bond and angle connection, the hybridization usually performs on part of AA potentials, such as the non-bonded term only, which includes the Van der Waals (VDW) potential and the Coulomb interaction, thus the total potential in the hybrid model is
A strongly hybrid potential (large λ) can reduce the atomic friction and speed up the simulation, while a lightly hybrid potential (
In this paper, we illustrate the hybrid CG approach in proteins, where the average potential for each CG pair is fitted by the Lennard–Jones (LJ) 12–6 potential and the Coulomb interaction. To mimic the implicit solvent effect with ions, the Coulomb interaction is modified as screened Coulomb function
However, the closest distance of the LJ potential in MARTINI is a preassigned value, which in our test is found to be more repulsive for polar groups while we integrate it into our hybrid model. To avoid this repulsive force breaking the stability of a native globular protein, any pairs of the CG site within 4 Å in a native structure are excluded from the calculation of the LJ potential. The cut-off 4 Å is chosen to be slightly larger than the distance of two neighbor
The solvent effect in our simulation is treated as GB/SA implicit solvent[31, 32] with surface tension setting to 0.005 kcal/mol/Å2 and α-cutoff adopted 10 Å. The ion concentration in GB/SA is set as 0.2 M and all the simulations are carried out at 300 K and adopting charm22-cmap AA force field.[33, 34] For the AA force field, the cut-off for both VDW and Coulomb is set as 12 Å and we apply a switched form for VDW. We adopt the same force field treatment for our average potential. The
The stability of protein’s native structure is critical for testing the accuracy of a newly developed force field. To understand whether the hyCG can stabilize native structures of proteins, a 35-residue Villin headpiece protein (PDB entry-code 1YRF) is used to test. We run a hyCG simulation and an AA simulation from the x-ray structure of 1YRF. Both are carried out for 100 ns.
The time evolution of the root of mean squared deviation (RMSD) of the protein backbone related to its native structure, shown in Fig.
To understand the formation of the secondary structure, we study the 1YRF protein again. We perform 250 ns simulation with the hyCG and the AA starting from a complete extended configuration. In order to understand how the non-bonded potential affects the formation efficiency of the secondary structure, we also carry out an energy enhanced sampling simulation on the non-bonded potential as a comparison, which is implemented by selective integrated temperature sampling (SITS).[35]
Two collective variables are chosen to illustrate the formation rate of the helical structure and the correctness of the secondary structure, see Fig.
To obtain enough samples for constructing the landscape in a short time simulation, we study a smaller protein instead of Villin. Here we study the folding of 20-residues Trp-cage protein (PDB entry-code 1L2Y) to understand the tertiary structure formation efficiency in the hyCG. Trp-cage is a widely studied protein due to its small size. To compare the efficiency of the hyCG and the AA force field, we first run N = 4 hyCG and AA simulations at 300 K, respectively, both starting at four different fully extended configurations generated at an equilibrium simulation under 700 K. Each trajectory has τ = 250 ns length. Among these, we find that there are n=2 hyCG trajectories reaching the native structure (with their backbone RMSD smaller than 2 Å) at the time
To investigate whether the folding path is still preserved in the hyCG, we construct the metastable states and their corresponding transitional relationship. A folding path can be described as a transitional network[39] from the unfolded state to the multiple metastable states and to the native state, which can be constructed by the current Markov state model[40–42] or the trajectory mapping method.[43, 44] Each metastable state and the main transitional path are visually shown in Fig.
Although some positions of FES local minimums in the hyCG fail to match strictly to that in the AA model, still, the main topology structure is maintained. Besides, the representative structure of important metastable states has a similar secondary and tertiary structure. Further, it verifies the similarity of their folding paths. However, certain metastable states, the V state in Fig.
We develop a novel hybrid approach, named hyCG, and implement it by combining the AA model and the MARTINI model together to efficiently simulate proteins. The hyCG model accelerates the folding dynamics of the small testing proteins about one order of magnitude without changing the folding pathway much. The saving of the time is from the reduced roughness of the potential energy surface in the hyCG. The maintaining of the shape of the potential energy surface in the CG space is responsible for the accuracy of the hyCG in keeping the folding path of proteins. Intuitively, in traditional hybrid CG models, the gaining of efficiency usually brings a sacrifice in the description of the fine interaction, for example, the interaction among two polar groups will gradually switch to a CG LJ potential, and it might change the stability of the hydrogen bond. This, however, is less concerning in the current hyCG where the average potential is applied as the CG part of the potential energy. Our result strongly evidences the practicability of this treatment in comparison with the faster but less accurate CG and the correct but much slower AA model. This hyCG can approximately achieve both stability and accuracy in the native structure of our test proteins, 1YRF and lL2Y. The formation of the secondary structure and the tertiary structure indicates that it has a higher rate than the AA. The folding landscape and transitional network also conclude the similarity in the folding path between the AA and the hyCG. However, we cannot fully expect that the current version of the hyCG can work well in the hydrogen bond abundant system since the non-bonded parameter in the MARTINI force field is originally assigned to describe the hydrophilic and hydrophobic effect.
In comparison with the pure CG scheme, the hybrid CG approach has a shortcoming, the time step in the simulation is required to be the same as that in the AA model, usually femtoseconds in protein, much smaller than that in the pure CG scheme. Therefore, the hybrid approach is actually an AA force field with reducing friction. The multiple time step method[45] may be applied to directly improve the efficiency of the hyCG. In addition, it is also possible to further accelerate the dynamics of the CG DoF in the hyCG then guide the large-scale conformational motion of AA coordinates.
The authors thank the support from the massive computation cluster in the Institute of Theoretical Physics of Chinese Academy of Sciences. Most long time simulations were carried out on this platform. Y. L. also appreciates the kind help from Shun Xu, who built a friendly and high-performance computational cluster that enabled us to quickly verify this idea in a small system and allowed us to frequently test the code without following the queue.
[1] | |
[2] | |
[3] | |
[4] | |
[5] | |
[6] | |
[7] | |
[8] | |
[9] | |
[10] | |
[11] | |
[12] | |
[13] | |
[14] | |
[15] | |
[16] | |
[17] | |
[18] | |
[19] | |
[20] | |
[21] | |
[22] | |
[23] | |
[24] | |
[25] | |
[26] | |
[27] | |
[28] | |
[29] | |
[30] | |
[31] | |
[32] | |
[33] | |
[34] | |
[35] | |
[36] | |
[37] | |
[38] | |
[39] | |
[40] | |
[41] | |
[42] | |
[43] | |
[44] | |
[45] | |
[46] |